Goto

Collaborating Authors

 scalable learning


PiRank: Scalable Learning To Rank via Differentiable Sorting

Neural Information Processing Systems

A key challenge with machine learning approaches for ranking is the gap between the performance metrics of interest and the surrogate loss functions that can be optimized with gradient-based methods. This gap arises because ranking metrics typically involve a sorting operation which is not differentiable w.r.t. the model parameters. Prior works have proposed surrogates that are loosely related to ranking metrics or simple smoothed versions thereof, and often fail to scale to real-world applications. We propose PiRank, a new class of differentiable surrogates for ranking, which employ a continuous, temperature-controlled relaxation to the sorting operator based on NeuralSort [1]. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature and further propose a divide-and-conquer extension that scales favorably to large list sizes, both in theory and practice. Empirically, we demonstrate the role of larger list sizes during training and show that PiRank significantly improves over comparable approaches on publicly available Internet-scale learning-to-rank benchmarks.


A Solver-free Framework for Scalable Learning in Neural ILP Architectures

Neural Information Processing Systems

There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as \emph{Neural ILP} in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neural ILP explicitly defines gradients through the ILP black box [Paulus et al. [2021]] - this trains extremely slowly, owing to a call to the underlying ILP solver for every training data point in a minibatch. In response, we present an alternative training strategy that is \emph{solver-free}, i.e., does not call the ILP solver at all at training time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP), together representing a polyhedron.


PyG 2.0: Scalable Learning on Real World Graphs

Fey, Matthias, Sunil, Jinu, Nitta, Akihiro, Puri, Rishi, Shah, Manan, Stojanovič, Blaž, Bendias, Ramona, Barghi, Alexandria, Kocijan, Vid, Zhang, Zecheng, He, Xinwei, Lenssen, Jan Eric, Leskovec, Jure

arXiv.org Artificial Intelligence

PyG (PyTorch Geometric) has evolved significantly since its initial release, establishing itself as a leading framework for Graph Neural Networks. In this paper, we present Pyg 2.0 (and its subsequent minor versions), a comprehensive update that introduces substantial improvements in scalability and real-world application capabilities. We detail the framework's enhanced architecture, including support for heterogeneous and temporal graphs, scalable feature/graph stores, and various optimizations, enabling researchers and practitioners to tackle large-scale graph learning problems efficiently. Over the recent years, PyG has been supporting graph learning in a large variety of application areas, which we will summarize, while providing a deep dive into the important areas of relational deep learning and large language modeling.


PiRank: Scalable Learning To Rank via Differentiable Sorting

Neural Information Processing Systems

A key challenge with machine learning approaches for ranking is the gap between the performance metrics of interest and the surrogate loss functions that can be optimized with gradient-based methods. This gap arises because ranking metrics typically involve a sorting operation which is not differentiable w.r.t. the model parameters. Prior works have proposed surrogates that are loosely related to ranking metrics or simple smoothed versions thereof, and often fail to scale to real-world applications. We propose PiRank, a new class of differentiable surrogates for ranking, which employ a continuous, temperature-controlled relaxation to the sorting operator based on NeuralSort [1]. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature and further propose a divide-and-conquer extension that scales favorably to large list sizes, both in theory and practice.


A Solver-free Framework for Scalable Learning in Neural ILP Architectures

Neural Information Processing Systems

There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as \emph{Neural ILP} in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neural ILP explicitly defines gradients through the ILP black box [Paulus et al. [2021]] – this trains extremely slowly, owing to a call to the underlying ILP solver for every training data point in a minibatch. In response, we present an alternative training strategy that is \emph{solver-free}, i.e., does not call the ILP solver at all at training time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP), together representing a polyhedron.


PiRank: Scalable Learning To Rank via Differentiable Sorting

#artificialintelligence

A key challenge with machine learning approaches for ranking is the gap between the performance metrics of interest and the surrogate loss functions that can be optimized with gradient-based methods. This gap arises because ranking metrics typically involve a sorting operation which is not differentiable w.r.t. the model parameters. Prior works have proposed surrogates that are loosely related to ranking metrics or simple smoothed versions thereof, and often fail to scale to real-world applications. We propose PiRank, a new class of differentiable surrogates for ranking, which employ a continuous, temperature-controlled relaxation to the sorting operator based on NeuralSort [1]. We show that PiRank exactly recovers the desired metrics in the limit of zero temperature and further propose a divide and-conquer extension that scales favorably to large list sizes, both in theory and practice. Empirically, we demonstrate the role of larger list sizes during training and show that PiRank significantly improves over comparable approaches on publicly available internet-scale learning-to-rank benchmarks.


Petuum Awarded OSDI 2021 Best Paper for Goodput-Optimized Deep Learning Research

#artificialintelligence

Petuum's CASL research and engineering team has won this year's OSDI 2021 Best Paper Award. This effort is led by Dr. Aurick Qiao who heads the Composability, Automatic, and Scalable Learning (CASL) research and engineering team at Petuum. Dr. Qiao received the Jay Lepreau Best Paper Award at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 2021 for the paper he co-authored, Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning which captures the revolutionary work implemented using one of CASL's key components, AdaptDL. Current live application of Pollux can be implemented via AdaptDL that integrates with PyTorch, Microsoft NNI, and with Ray coming soon. Pollux as implemented by AdaptDL improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level.


Fast and scalable learning of neuro-symbolic representations of biomedical knowledge

Agibetov, Asan, Samwald, Matthias

arXiv.org Artificial Intelligence

In this work we address the problem of fast and scalable learning of neuro-symbolic representations for general biological knowledge. Based on a recently published comprehensive biological knowledge graph (Alshahrani, 2017) that was used for demonstrating neuro-symbolic representation learning, we show how to train fast (under 1 minute) log-linear neural embeddings of the entities. We utilize these representations as inputs for machine learning classifiers to enable important tasks such as biological link prediction. Classifiers are trained by concatenating learned entity embeddings to represent entity relations, and training classifiers on the concatenated embeddings to discern true relations from automatically generated negative examples. Our simple embedding methodology greatly improves on classification error compared to previously published state-of-the-art results, yielding a maximum increase of $+0.28$ F-measure and $+0.22$ ROC AUC scores for the most difficult biological link prediction problem. Finally, our embedding approach is orders of magnitude faster to train ($\leq$ 1 minute vs. hours), much more economical in terms of embedding dimensions ($d=50$ vs. $d=512$), and naturally encodes the directionality of the asymmetric biological relations, that can be controlled by the order with which we concatenate the embeddings.


Scalable Learning of Non-Decomposable Objectives

Eban, Elad ET., Schain, Mariano, Mackey, Alan, Gordon, Ariel, Saurous, Rif A., Elidan, Gal

arXiv.org Machine Learning

Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the $F_\beta$ score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest. In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline.


Scalable Learning for Structure in Markov Logic Networks

Sun, Zhengya (Chinese Academy of Sciences) | Wei, Zhuoyu (Chinese Academy of Sciences) | Wang, Jue (Chinese Academy of Sciences) | Hao, Hongwei (Chinese Academy of Sciences)

AAAI Conferences

Markov Logic Networks (MLNs) provide a unifying framework that incorporates first-order logic and probability. However, learning the structure of MLNs is a computationally hard task due to the large search space and the intractable clause evaluation. In this paper, we propose a random walk-based approach to learn MLN structure in a scalable manner. It uses the interactions existing among the objects to constrain the search space of candidate clauses. Specifically, we obtain representative subset of simple paths by sampling from all sequences of distinct objects. We then transform each sampled path into possible ground atoms, and use them to form clauses. Based on the resulting ground network, we finally attach a set of weights to the clauses by optimizing l 1 -constrained conditional likelihood. The experimental results demonstrate that our approach performs favorably compared to previous approaches.